Parametric datatype-genericity

Parametric datatype-genericity. Jeremy Gibbons and Ross Paterson. Submitted for publication.

Datatype-generic programs are programs that are parametrized by a datatype or type functor. There are two main styles of datatype-generic programming: the Algebra of Programming approach, characterized by structured recursion operators parametrized by a shape functor, and the Generic Haskell approach, characterized by case analysis over the structure of a datatype. We show that the former enjoys a kind of parametricity, relating the behaviours of generic functions at different types; in contrast, the latter is more ad hoc, with no coherence required or provided between the various clauses of a definition.

How could we have not mentioned this before?

The main result of this paper is that fold is a higher-order natural transformation. This means, the authors explain, that fold is a rather special kind of datatype-generic operator, both enjoying and requiring coherence between its datatype-specific instances.

We had several long discussions about the uniqueness of fold, which may serve as an introduction for those new to this sort of discussion. The tutorial on the universality of fold is, of course, on the papers page.

For some reason I feel a craving for bananas...

DySy: Dynamic Symbolic Execution for Invariant Inference

DySy: Dynamic Symbolic Execution for Invariant Inference. Christoph Csallner, Nikolai Tillmann, Yannis Smaragdakis

Dynamically discovering likely program invariants from concrete test executions has emerged as a highly promising software engineering technique. Dynamic invariant inference has the advantage of succinctly summarizing both “expected”program inputs and the subset of program behaviors that is normal under those inputs. In this paper, we introduce a technique that can drastically increase the relevance of inferred invariants, or reduce the size of the test suite required to obtain good invariants. Instead of falsifying invariants produced by pre-set patterns, we determine likely program invariants by combining the concrete execution of actual test cases with a simultaneous symbolic execution of the same tests. The symbolic execution produces abstract conditions over program variables that the concrete tests satisfy during their execution. In this way, we obtain the benefits of dynamic inference tools like Daikon: the inferred invariants correspond to the observed program behaviors. At the same time, however, our inferred invariants are much more suited to the program at hand than Daikon’s hardcoded invariant patterns. The symbolic invariants are literally derived from the program text itself, with appropriate value substitutions as dictated by symbolic execution. We implemented our technique in the DySy tool, which utilizes a powerful symbolic execution and simplification engine. The results confirm the benefits of our approach. In Daikon’s prime example benchmark, we infer the majority of the interesting Daikon invariants, while eliminating invariants that a human user is likely to consider irrelevant.

The Daikon invariant detector mentioned in the abstract is here.

Seems like a pretty straightforward idea, I guess, but as always: God is in the details.

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

I think the real culprit behind the "poor performance" of GC languages is demonstrated in Quantifying the Performance of Garbage Collection vs. Explicit Memory Management, by Matthew Hertz and Emery D. Berger:

Garbage collection yields numerous software engineering benefits, but its quantitative impact on performance remains elusive. One can measure the cost of conservative garbage collection relative to explicit memory management in C/C++ programs by linking in an appropriate collector. This kind of direct comparison is not possible for languages designed for garbage collection (e.g., Java), because programs in these languages naturally do not contain calls to free. Thus, the actual gap between the time and space performance of explicit memory management and precise, copying garbage collection remains unknown.

We take the first steps towards quantifying the performance of precise garbage collection versus explicit memory management. We present a novel experimental methodology that lets us treat unaltered Java programs as if they used explicit memory management. Our system generates exact object reachability information by processing heap traces with the Merlin algorithm [34, 35]. It then re-executes the program, invoking free on objects just before they become unreachable. Since this point is the latest that a programmer could explicitly free objects, our approach conservatively approximates explicit memory management. By executing inside an architecturally-detailed simulator, this “oracular” memory manager eliminates the effects of trace processing while measuring the costs of calling malloc and free.

We compare explicit memory management to both copying and non-copying garbage collectors across a range of benchmarks, and include real (non-simulated) runs that validate our results. These results quantify the time-space tradeoff of garbage collection: with five times as much memory, an Appel-style generational garbage collector with a non-copying mature space matches the performance of explicit memory management. With only three times as much memory, it runs on average 17% slower than explicit memory management. However, with only twice as much memory, garbage collection
degrades performance by nearly 70%. When physical memory is scarce, paging causes garbage collection to run an order of magnitude slower than explicit memory management.

Overall, generational collectors can add up to 50% space overhead and 5-10% runtime overheads if we ignore virtual memory. Very reasonable given the software engineering benefits. However, factoring in virtual memory with its attendant faulting paints a very different picture in section 5.2:

These graphs show that, for reasonable ranges of available memory (but not enough to hold the entire application), both explicit memory managers substantially outperform all of the garbage collectors. For instance, pseudoJBB running with 63MB of available memory and the Lea allocator completes in 25 seconds. With the same amount of available memory and using GenMS, it takes more than ten times longer to complete (255 seconds). We see similar trends across the benchmark suite. The most pronounced case is javac: at 36MB with the Lea allocator, total execution time is 14 seconds, while with GenMS, total execution time is 211 seconds, over a 15-fold increase.

The culprit here is garbage collection activity, which visits far more pages than the application itself [60]

Considering the prevalence of virtual memory, this poses a significant problem for garbage collected languages, server-side systems in particular. LTU has covered garbage collectors that co-operate with virtual memory managers before by the same authors.

Evolutionary Programming and Gradual Typing in ECMAScript 4

Evolutionary Programming and Gradual Typing in ECMAScript 4, by Lars T Hansen, Adobe Systems, is an 18 page tutorial detailing features of ECMAScript 4, and how they can be applied to existing ECMAScript code to improve it.

This tutorial uses a simple library as a running example to illustrate the evolution of a program from the ES3 "script" stage, via various levels of typing and rigor, to an ES4 package with greater guarantees of integrity and better performance potential than the original code. We then look at how union types, generic functions, and reflection can be used to work with a library whose code we can't modify.

Logic for Philosophy

A draft textbook by Theodore Sider aimed at philsophy graduate students that while not as technical as computer scientists are used to, may be of interest due to the explicit discussion of extensions (e.g., modal operators), deviations (e.g., multi-valued logic) and variations (such as the Sheffer Stroke) on basic propositional logic.

The book includes chapters on counterfactuals and two-dimensional modal logic that may include material new to PLT wonks.

Clojure

Haven't noticed it being mentioned here yet: In the actor-concurrency vein, Clojure is vaguely like Elang, but targeting the JVM. Some comparisons vs. Erlang have been made.

OCaml Light: A Formal Semantics For a Substantial Subset of the Objective Caml Language

OCaml Light: a formal semantics for a substantial subset of the Objective Caml language.

OCaml light is a formal semantics for a substantial subset of the Objective Caml language. It is written in Ott, and it comprises a small-step operational semantics and a syntactic, non-algorithmic type system. A type soundness theorem has been proved and mechanized using the HOL-4 proof assistant, thereby ensuring that the proof is free from errors. To ensure that the operational semantics accurately models Objective Caml, an executable version of the semantics has been created (and proved equivalent in HOL to the original, relational version) and tested on a number of small test cases.

Note that while we have tried to make the semantics accurate, we are not part of the OCaml development team - this is in no sense a normative specification of the implemented language.

From a team including Peter Sewell (Acute, HashCaml, Ott).

I continue to believe that things are heating up nicely in mechanized metatheory, which, in the multicore/multiprocessor world in which we now live, is extremely good news.

The Carnap Programming Language

Carnap is a general purpose programming language for the next generation of many-core devices, many many-core systems and their applications. It introduces a process oriented programming model that allows programmers to separate the concerns: Carnap programs consist of data structures and the concurrent processes that act upon them.

§2 "The primitive process of a Carnap program is called an action. An action determines a local or shared state. Actions are assembled by construction to form the component processes of a program. Programs consist of concurrent processes that construct and interact via logically shared data structures and resources called Contexts.

§3 In this way the application programmer is able to separate concerns, reasoning separately about the two primary aspects of Carnap programs: potentially large scale data structures and the concurrent processes that act upon them.

§4 Contexts are named, type associative and statically typed.


Carnap was mentioned here before, but I think that the website provides more information now, specifically there are now some example programs to look at.

Inductive Synthesis of Functional Programs: An Explanation Based Generalization Approach

Inductive Synthesis of Functional Programs: An Explanation Based Generalization Approach
Emanuel Kitzelmann, Ute Schmid; JMLR Volume 7(Feb), 2006.

We describe an approach to the inductive synthesis of recursive equations from input/output-examples which is based on the classical two-step approach to induction of functional Lisp programs of Summers (1977). In a first step, I/O-examples are rewritten to traces which explain the outputs given the respective inputs based on a datatype theory. These traces can be integrated into one conditional expression which represents a non-recursive program. In a second step, this initial program term is generalized into recursive equations by searching for syntactical regularities in the term. Our approach extends the classical work in several aspects. The most important extensions are that we are able to induce a set of recursive equations in one synthesizing step, the equations may contain more than one recursive call, and additionally needed parameters are automatically introduced.

Is this the future of programming?

Samurai - Protecting Critical Data in Unsafe Languages

Samurai - Protecting Critical Data in Unsafe Languages. Karthik Pattabiraman, Vinod Grover, Benjamin G. Zorn.

Programs written in type-unsafe languages such as C and C++ incur costly memory errors that result in corrupted data structures, program crashes, and incorrect results. We present a data-centric solution to memory corruption called critical memory, a memory model that allows programmers to identify and protect data that is critical for correct program execution. Critical memory defines operations to consistently read and update critical data, and ensures that other non-critical updates in the program will not corrupt it. We also present Samurai, a runtime system that implements critical memory in software. Samurai uses replication and forward error correction to provide probabilistic guarantees of critical memory semantics. Because Samurai does not modify memory operations on non-critical data, the majority of memory operations in programs run at full speed, and Samurai is compatible with third party libraries. Using both applications, including a Web server, and libraries (an STL list class and a memory allocator), we evaluate the performance overhead and fault tolerance that Samurai provides.

An acceptable overhead for the pleasure of using an unsafe language? You decide.